Combining Methods to Create Synthetic Microdata: Quantile Regression, Hot Deck, and Rank Swapping

نویسندگان

  • Jennifer Huckett
  • Michael D. Larsen
چکیده

Government agencies must simultaneously disseminate useful microdata and maintain confidentiality of individual records. Releasing synthetic data is one approach. We propose to create synthetic data using a combination of quantile regression, hot deck imputation, and rank swapping. The result is a releasable data set containing original values for a few key variables, synthetic quantile regression predictions for several variables, and imputed and perturbed values for remaining variables. The procedure should provide quality data to the user and simultaneously protect the confidentiality of respondents. The methods is illustrated by creating synthetic data for a Public Use Microdata Set from the American Community Survey.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Disclosure Risk for a Synthetic Data Set Created Using Multiple Methods

Government agencies must simultaneously maintain confidentiality of individual records and disseminate useful microdata. We propose a method to create synthetic data that combines quantile regression, hot deck imputation, and rank swapping. The result from implementation of the proposed procedure is a releasable data set containing original values for a few key variables, synthetic quantile reg...

متن کامل

LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection

In previous work by Domingo-Ferrer et al., rank swapping and multivariate microaggregation has been identified as well-performing masking methods for microdata protection. Recently, Dandekar et al. proposed using synthetic microdata, as an option, in place of original data by using Latin hypercube sampling (LHS) technique. The LHS method focuses on mimicking univariate as well as multivariate s...

متن کامل

Re-identification Methods for Masked Microdata

Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor ...

متن کامل

The Impact of Alternative Imputation Methods on the Measurement of Income and Wealth: Evidence from the Spanish Survey of Household Finances

The goal of this paper is to emphasise the importance of the way of handling missing data and its impact on the outcome of empirical studies. Using the 2002 wave of the Spanish Survey of Household Finances (EFF), I study the performance of alternative methods: listwise deletion, non-stochastic, multiple and single imputation based on linear-regression models, and hot-deck procedures. Using desc...

متن کامل

Community-Wide Health Risk Assessment Using Geographically Resolved Demographic Data: A Synthetic Population Approach

BACKGROUND Evaluating environmental health risks in communities requires models characterizing geographic and demographic patterns of exposure to multiple stressors. These exposure models can be constructed from multivariable regression analyses using individual-level predictors (microdata), but these microdata are not typically available with sufficient geographic resolution for community risk...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008